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Abstract 

We consider how to make probability forecasts of binary labels. Our 
main mathematical result is that for any continuous gambling strategy 
used for detecting disagreement between the forecasts and the actual la- 
bels, there exists a forecasting strategy whose forecasts are ideal as far 
as this gambling strategy is concerned. A forecasting strategy obtained 
in this way from a gambling strategy demonstrating a strong law of large 
numbers is simplified and studied empirically. 

1 Introduction 

Probability forecasting can be thought of as a game between two players, Fore- 
caster and Reality: 

FOR n = 1,2,...: 

Reality announces x n G X. 
Forecaster announces p n € [0, 1]. 
Reality announces y n G {0, 1}. 

On each round, Forecaster predicts Reality's move y n chosen from the label 
space, always taken to be {0, 1} in this paper. His move, the probability forecast 
p n , can be interpreted as the probability he attaches to the event y n = 1. To 
help Forecaster, Reality presents him with an object x n at the beginning of the 
round; chosen from an object space X. 

Forecaster's goal is to produce p n that agree with the observed y n . Various 
results of probability theory, in particular limit theorems (such as the weak and 



strong laws of large numbers, the law of the iterated logarithm, and the central 
limit theorem) and large-deviation inequalities (such as Hocffding's inequality) , 
describe different aspects of agreement between p n and y n . For example, ac- 
cording to the strong law of large numbers, we expect that 



Such results will be called laws of probability and the existing body of laws of 
probability will be called classical probability theory. 

In 12 following we formalize Forecaster's goal by adding a third player, 
Skeptic, who is allowed to gamble at the odds given by Forecaster's probabilities. 
We state a result from and J2| suggesting that Skeptic's gambling strategies 
can be used as tests of agreement between p n and y n and that all tests of 
agreement between p n and y n can be expressed as Skeptic's gambling strategies. 
Therefore, the forecasting protocol with Skeptic provides an alternative way of 
stating laws of probability. 

As demonstrated in |12| , many standard proof techniques developed in clas- 
sical probability theory can be translated into continuous strategies for Skeptic. 
In |2I we show that for any continuous strategy S for Skeptic there exists a strat- 
egy T for Forecaster such that S does not detect any disagreement between the 
y n and the p n produced by T . This result is a "meta-theorem" that allows one 
to move from laws of probability to forecasting algorithms: as soon as a law of 
probability is expressed as a continuous strategy for Skeptic, we have a forecast- 
ing algorithm that guarantees that this law will hold; there are no assumptions 
about Reality, who may play adversarially. 

Our meta-theorem is of any interest only if one can find sufficiently interest- 
ing laws of probability (expressed as gambling strategies) that can serve as its 
input. In S^Jwe apply it to the important properties of unbiasedness in the large 
and small of the forecasts p n (CQ) is an asymptotic version of the former). The 
resulting forecasting strategy is automatically unbiased, no matter what data 
Xi, yi, X2, 2/2, ■ • ■ is observed. 

In [J3]we simplify the algorithm obtained in 21 and demonstrate its perfor- 
mance on some artificially generated data sets. 

2 The gambling framework for testing probabil- 
ity forecasts 

Skeptic is allowed to bet at the odds defined by Forecaster's probabilities, and he 
refutes the probabilities if he multiplies his capital manyfold. This is formalized 
as a perfect-information game in which Skeptic plays against a team composed 
of Forecaster and Reality: 

Binary Forecasting Game I 
Players: Reality, Forecaster, Skeptic 



n 




(1) 
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Protocol: 

/Co := 1. 

FOR n = 1,2,...: 

Reality announces x n S X. 
Forecaster announces p n G [0, 1]. 
Skeptic announces s„ € R. 
Reality announces y„ € {0, 1}. 

JC n ■= JCn-i + S n (y„ - p n ). 

Restriction on Skeptic: Skeptic must choose the s n so that his capital is 
always nonnegative (JC n > for all n) no matter how the other players move. 

This is a perfect-information protocol; the players move in the order indicated, 
and each player sees the other player's moves as they are made. It specifics 
both an initial value for Skeptic's capital (/Co = 1) and a lower bound on its 
subsequent values (JC n > 0). 

Our interpretation, which will be called the testing interpretation, of Binary 
Forecasting Game I is that JC n measures the degree to which Skeptic has shown 
Forecaster to do a bad job of predicting yt, i = 1, . . . , n. 

2.1 Validity and universality of the testing interpretation 

As explained in |12| . the testing interpretation is valid and universal in an 
important sense. Let us assume, for simplicity, that objects are absent (formally, 
that |X| = 1). In the case where Forecaster starts from a probability measure 
P on {0, 1}°° and obtains his forecasts p n £ [0, 1] as conditional probabilities 
under P that y n — 1 given yi, . . . , y n -i, we have a standard way of testing P 
and, therefore, p n : choose an event A C {0, 1}°° (the critical region) with a 
small P(A) and reject P if A happens. The testing interpretation satisfies the 
following two properties: 

Validity Suppose Skeptic's strategy is measurable and p n are obtained from P; 
K n then form a nonnegative martingale w.r. to P. According to Doob's 
inequality |14l for any positive constant C, sup„/C„ > C with P- 
probability at most 1/C. (If Forecaster is doing a bad job according to 
the testing interpretation, he is also doing a bad job from the standard 
point of view.) 

Universality According to Ville's theorem (^2, §8.5), for any positive con- 
stant e and any event A C {0, 1}°° such that P(A) < e, Skeptic has a 
measurable strategy that ensures liminfn^oo tC n > 1/e whenever A hap- 
pens, provided p n are computed from P. (If Forecaster is doing a bad 
job according to the standard point of view, he is also doing a bad job 
according to the testing interpretation.) In the case P(A) — 0, Skeptic 
actually has a measurable strategy that ensures liirin^oo /C„ = oo on A. 

The universality of the gambling scenario of Binary Forecasting Game I is its 
most important advantage over von Mises's gambling scenario based on subse- 
quence selection; it was discovered by Ville |14| . 
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2.2 Continuity of gambling strategies 

In we constructed Skeptic's strategies that made him rich when the state- 
ment of any of several key laws of probability theory was violated. The construc- 
tions were explicit and lead to continuous gambling strategies. We conjecture 
that every natural result of classical probability theory leads to a continuous 
strategy for Skeptic. 

3 Defeating Skeptic 

In this section we prove the main (albeit very simple) mathematical result of 
this paper: for any continuous strategy for Skeptic there exists a strategy for 
Forecaster that does not allow Skeptic's capital to grow, regardless of what 
Reality is doing. Actually, our result will be even stronger: we will have Skeptic 
announce his strategy for each round before Forecaster's move on that round 
rather than making him announce his full strategy at the beginning of the game, 
and we will drop the restriction on Skeptic. Therefore, we consider the following 
perfect-information game that pits Forecaster against the two other players: 

Binary Forecasting Game II 
Players: Reality, Forecaster, Skeptic 
Protocol: 
/Co := 1. 

FOR n = 1,2,...: 

Reality announces x n G X. 

Skeptic announces continuous S n : [0, 1] — > K. 

Forecaster announces p n £ [0,1]. 

Reality announces y n € {0, 1}. 

K n := K, n -x + S n (p n )(y n - p n ). 

Theorem 1 Forecaster has a strategy in Binary Forecasting Game II that en- 
sures /Co > IC\ > K-2 > • • • ■ 

Proof Forecaster can use the following strategy to ensure Kq>K\ >•••'. 

• if the function S n (p) takes the value 0, choose p n so that S n (p n ) = 0; 

• if S n is always positive, take p n := 1; 

• if S n is always negative, take p n := 0. I 

4 Examples of gambling strategies 

In this section we discuss strategies for Forecaster obtained by Theorem from 
different strategies for Skeptic; the former will be called defensive forecasting 
strategies. There are many results of classical probability theory that we could 
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use, but we will concentrate on the simple strategy described in P- 69, for 
proving the strong law of large numbers. 

If S n (p) = S n does not depend on p, the strategy from the proof of Theorem^ 
makes Forecaster choose 

* if S n < 

1 if S n > 

Oorl ifS„ = 0. 

The basic procedure described in ^2] (p- 69) is as follows. Let e G (0,0.5] 
be a small number (expressing our tolerance to violations of the strong law of 
large numbers). In Binary Forecasting Game I, Skeptic can ensure that 

1 " 

sup/C„ < oo limsup — — Pi) < e (2) 

using the strategy s n = s n :— e/C„_i. Indeed, since 

n 

IC n = ]J(1 +e(yi - Pi)), 

i=l 

on the paths where JC n is bounded we have 

n 

n(l + e(w-Pi))<C> 



J2hx(l + e( yi - Pi )) <lnC, 
i=i 

n n 

j=l i=l 
n 

e^2(Vi-Pi) < InC + e 2 

i=i 

lA, s InC 
- "Pi) ^ 



(we have used the fact that ln(l + t)>t — t 2 when \t\ < 0.5). If Skeptic wants 
to ensure 

sup/C„ < oo 

« 1 n l n 

- e < liminf - V7yi -_pi) < limsup - -pi) < e, 
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he can use the strategy s n :— (s* + s n e )/2, and if he wants to ensure 

1 " 

sup/C„ < oo =>• lim - y^{yi -Pi) = 0, (3) 



n — >oc 77, 

i=l 



he can use a convex mixture of (s^ + s n e )/2 over a sequence of e converging to 
zero. There are also standard ways of strengthening to 

1 " 

lim inf JC n < oo => lim — > (yi — Pi) = 0; 

n — >oc n — >oc 77, ^ — ' 

i=l 

for details, see [T2] . 

In the rest of this section we will draw on the excellent survey [2] • We will 
see how Forecaster defeats increasingly sophisticated strategies for Skeptic. 

4.1 Unbiasedness in the large 

Following Murphy and Epstein 0, we say that Forecaster is unbiased in the 
large if holds. Let us first consider the one-sided relaxed version of this 
property 

1 - 

limsup-V7y 4 - p l ) < e. (4) 



n — >oc Tl . 

2=1 



The strategy for Skeptic described above, S n (p) :— e/C„, leads to Forecaster 
always choosing p n := 1; Q is then satisfied in a trivial way. 
Forecaster's strategy corresponding to the two-sided version 

_^ n 1 n 

-e < lim inf - y^(yi - pi) < lim sup - y^(yi - p%) < e (5) 

n->oo n f-f iwoo n f-f 

is not much more reasonable. Indeed, it can be represented as follows. The 
initial capital 1 is split evenly between two accounts, and Skeptic gambles with 
the two accounts separately. If at the outset of round n the capital on the first 
account is and the capital on the second account is IC^ l _ 1 , Skeptic plays 

s„ := eK,] l _ 1 with the first account and s„ :— —cK, n _ 1 with the second account; 
his total move is 

(n-1 n-1 \ 

n a + e( yi - Pi )) - n a + <pi - vr)) ■ 
i=i i=i / 

Therefore, Forecaster's move is p n := 1 if 

n—1 n—1 

in (! + t(vi - Pi)) > M 1 + e (p< ~ f*)). 
i=i s=i 

p„ := if 

n—1 n—1 

5^ ln(l + efofc - p^) < ln (! + e fe ~ 2/0), 

i=l i=l 



G 



and p n can be chosen arbitrarily in the case of equality. The limiting form of 
this strategy as e — > is: Forecaster's move is p n := 1 if 



^2(Vi ~Pi) > 0) 



Pn ~ if 



i=l 



n-1 



i=l 



and p„ can be chosen arbitrarily in the case of equality. 

We can see that unbiasedness in the large does not lead to interesting fore- 
casts: Forecaster fulfils his task too well. In the one-sided case (@J, he always 
chooses p n := 1 making 

n 
i=l 

as small as possible. In the two-sided case (jSJ) with e — > 0, he manages to 
guarantee that 



< i. 



(6) 



His goals are achieved with categorical forecasts, p n € {0, 1}. 

In the rest of this section we consider the more interesting case where S n (p) 
depends on p. 



4.2 Unbiasedness in the small 

We now consider a subtler requirement that forecasts should satisfy, which we 
introduce informally. We say that the forecasts p n are unbiased in the small (or 
reliable, or valid, or well calibrated) if, for any p* £ [0, 1], 

^z=l,...,n:r>i~r>* Vi * /r7 \ 

y Y^P (7) 

provided J2i=i n Pi «p* 1 ^ s n °t too small. 

Let us first consider just one value for p* . Instead of the "crisp" point p* 
we will consider a "fuzzy point" / : [0,1] — > [0,1] such that I(p*) = 1 and 
I(p) = for all p outside a small neighborhood of p* . A standard choice would 
be something like / := where [p_,p + ] is a short interval containing p* 

and I[p_ )P+ ] is its indicator function, but we will want I to be continuous (it 
can, however, be arbitrarily close to I[p_ jP+ ]). 

The strategy for Skeptic ensuring J5J can be modified as follows. Let 
e £ (0,0.5] be again a small number. Now we consider the strategy S n (p) = 
Sn' T (p) := el(p)fcn-i- Since 

n 

fcn = Y[(l + tl(pi)(yi - Pi)), 

i=l 
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on the paths where JC n is bounded we have 

n 

11(1 + e/(Pi)(l/i- Pi)) <C, 

1=1 

■n 

X)ln(l + ^(Pi)(Wi-ft)) <hC, 

1=1 

n n 

/(piXifc - Pi ) - e 2 ^/ 2 (K)(^ -P,) 2 < InC, 
i=l i=l 

n n 

e^2l(p l )(y t -p l )-e 2 J2 I (p l ) <lnC 

i=l t=l 

(the last step involves replacing I 2 (pi) with I(pi); the loss of precision is not 
great if / is close to I[ p _ iP+ ]), 

n n 
i=l i=l 

T l 7=i I (Pi)(yj - Pi) < lnC , e 

The last inequality shows that the mean of yi for p, close to p* is close to 
p* provided we have observed sufficiently many such pf, its interpretation is 
especially simple when I is close to I[ p _ iP+ ]. 

In general, we may consider a mixture of S^ I (p) and S~ e,I {p) for different 
values of e and for different I covering all p* E [0, 1]. If we make sure that the 
mixture is continuous (which is always the case for continuous / and finitely 
many e and I), Theorem ^ provides us with forecasts that are unbiased in the 
small. 



4.3 Using the objects 

Unbiasedness, even in the small, is only a necessary but far from sufficient con- 
dition for good forecasts: for example, a forecaster who ignores the objects x n 
can be perfectly calibrated, no matter how much useful information x n contain. 
(Cf. the discussion of resolution in we prefer not to use the term "resolu- 
tion" , which is too closely connected with the very special way of probability 
forecasting based on sorting and labeling.) It is easy to make the algorithm 
of the previous subsection take the objects into account: we can allow the test 
functions / to depend not only on p but also on the current object x n \ S n (p) 
then becomes a mixture of 

n-l 

S n I( J>) ■= tl(p,x n ) JJ(1 + eI(Pi,Xi)(yi - Pi)) 

4=1 

and S~ e,I (p) (defined analogously) over e and /. 
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4.4 Relation to a standard counter-example 



Suppose, for simplicity, that objects are absent (|X| = 1). The standard con- 
struction from Dawid |J showing that no forecasting strategy produces forecasts 
p n that are unbiased in the small for all sequences is as follows. Define an infinite 
sequence y\ , y2 , • ■ • recursively by 



where p n is the forecast produced by the forecasting strategy after seeing 
. . . ,y n ~i- For the forecasts p n < 0.5 wc always have y n = 1 and for the 
forecasts p n > 0.5 we always have y n — 0; obviously, we do not have unbiased- 
ness in the small. 

Let us see what Dawid's construction gives when applied to the defensive 
forecasting strategy constructed from the mixture of S^ I (p) and S~ e,I (p), as 
described above, over different e and different /; we will assume not only that 
the test functions / cover all [0, 1] but also that each point p € [0, 1] is covered by 
arbitrarily narrow (concentrated in a small neighborhood of p) test functions. It 
is clear that we will inevitably have p n — > 0.5 if p n are produced by the defensive 
forecasting strategy and y n are produced by Dawid's construction. On the other 
hand, since all test functions / are continuous and so cannot sharply distinguish 
between the cases p n < 0.5 and p n > 0.5, we do not have any contradiction: 
neither the test functions nor any observer who can only measure the p n with 
a finite precision can detect the lack of unbiasedness in the small. 

In this paper we are only interested in unbiasedness in the small when the 
test functions / are required to be continuous. Dawid's construction shows that 
unbiasedness in the small is impossible to achieve if / are allowed to be indicator 
functions of intervals (such as [0,0.5) and [0.5, 1]). To achieve unbiasedness in 
the small in this stronger sense, randomization appears necessary (see, e.g., |18p. 
It is interesting that already a little bit of randomization suffices, as explained 
in0. 

5 Simplified algorithm 

Let us assume first that objects are absent, |X| = 1. It was observed empirically 
that the performance of defensive forecasting strategies with a fixed e does not 
depend on e much (provided it is not too large; e.g., in the above calculations 
we assumed e < 0.5). This suggests letting e — > (in particular, we will assume 
that e -c n~ 2 ). As the test functions I we will take Gaussian bells Ij with 
standard deviation a > located densely and uniformly in the interval [0,1]. 
Letting rs stand for approximate equality and using the shorthand ^ ± /(±) := 
/(+) + /(— ), we obtain: 




1 if p n < 0.5 
otherwise, 



n-l 



s n(p) = E E( ±£ ) j j (?) II (! ± ^APi){yi - Pi)) 



± 3 
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E E( ±e ) J » ex p E ln (! ± ^(PiM - pd) 



vi=l 

/ n-l 



E E( ±e ) 7 J (p) exp ±e E ^ (p*)^ ~ pi ) 

±3 V i=l J 



n-l 



± j V i=i y 



( n-l 
±eJ2Wi)iyi-Pi) 
i=l 

n-l 

a E 7 i (p) E ^ (p*)^ - pi ) 

j i=l 
n-l 



= Y,K( P ,p i ){y i -p i ), (8) 

i=l 

where K(p,pi) is the Mercer kernel 

#(P.Pi) :=53ii(p)ij(Pi). 

This Mercer kernel can be approximated by 

1 1 / (t- P ) 2 \ 1 / {t- Vi f \ 

exp — - — - — exp — - — at 



V2^a V 2a 2 j ^a \ 2a 



f {t-pf+jt-pif 
2a 



oc j exp [ — ) dt 



, ex H & — ) dt - 

As a function of p, the last expression is proportional to the density of the sum 
of two Gaussian random variables of variance a 2 ; therefore, it is proportional to 

exp 

To get an idea of the properties of this forecasting strategy, which we call 
the K29 strategy (or algorithm), we run it and the Laplace forecasting strategy 
(Pn '■= (fc+1)/ (ri+1), where k is the number of Is observed so far) on a randomly 
generated bit sequence of length 1000 (with the probability of 1 equal to 0.5). 
A zero point p n of S n was found using the simple bisection procedure (see, e.g., 

§§9.2-9.4, for more sophisticated methods): (a) start with the interval [0, 1]; 
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Figure 1: The First 1000 Probabilities Output by the K29 (a = 0.01) and 
Laplace Forecasting Strategies on a Randomly Generated Bit Sequence 



(b) let p be the mid-point of the current interval; (c) if S n (p) > 0, remove the 
left half of the current interval; otherwise, remove its right half; (d) go to (b). 
We did 10 iterations, after which the mid-point of the remaining interval was 
output as p n . Notice that the values S n (0) and S n (l) did not have to be tested. 
Our program was written in MATLAB, Version 7, and the initial state of the 
random number generator was set to 0. 

Figure n shows that the probabilities output by the K29 (a = 0.01) and 
Laplace forecasting strategies are almost indistinguishable. To see that these 
two forecasting strategies can behave very differently, we complemented the 
1000 bits generated as described above with 1000 0s followed by 1000 Is. The 
result is shown in Figure El The K29 strategy detects that the probability p of 
1 changes after the 1000th round, and fairly quickly moves down. When the 
probability changes again after the 2000th round, K29 starts moving toward 
p = 1, but interestingly, hesitates around the line p = 0.5, as if expecting the 
process to reverse to the original probability of 1. 

The Mercer kernel 

A>, P ,)=e* P (-fc-?J!) 

used in these experiments is known in machine learning as the Gaussian kernel 
(in the usual parameterization 4cr 2 is replaced by 2cr 2 or c); however, many other 
Mercer kernels also give reasonable results. 
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0.2 - 
0.1 - 

I 1 1 1 1 

500 1000 1500 2000 2500 3000 

Figure 2: The Probabilities Output by the K29 (a = 0.01) and Laplace Fore- 
casting Strategies on a Randomly Generated Sequence of 1000 Bits Followed by 
1000 0s and 1000 Is 

If we start from test functions / depending on the object, instead of © we 
will arrive at the expression 

n-l 

Snijp) = ^2 K((p,x n ), {p l ,x l ))(y l -p^, (9) 

i=l 

where K is a Mercer kernel on the squared product ([0, 1] x X) 2 . There are 
standard ways of constructing such Mercer kernels from Mercer kernels on [0, l] 2 
and X 2 (see, e.g., the description of tensor products and direct sums in [181 111) ). 
For S n to be continuous, we have to require that K be forecast- continuous in 
the following sense: for all x £ X and all (p' , x') £ [0, 1] x X, K((p, x), (p' , x')) 
is continuous as a function of p. The overall procedure can be summarized as 
follows. 

K29 Algorithm 

Parameter: forecast-continuous Mercer kernel K on ([0, 1] x X) 2 
FOR n = 1,2,...: 

Read x n £ X. 

Define S n (p) as per JJjJ. 

Output any root p of S n (p) = as p n ; 

if there are no roots, p n := (1 + sign(5„))/2. 

Read y n £ {0,1}. 



12 



Computer experiments reported in JS] show that the K29 algorithm performs 
well on a standard benchmark data set. For a theoretical discussion of the K29 
algorithm, see (Appendix) and [T7| . 

6 Related work and directions of further re- 
search 

This paper's methods connect two areas that have been developing indepen- 
dently so far: probability forecasting and classical probability theory. It appears 
that, when properly developed, these methods can benefit both areas: 

• the powerful machinery of classical probability theory can be used for 
probability forecasting; 

• practical problems of probability forecasting may suggest new laws of prob- 
ability. 

Classical probability theory started from Bernoulli's weak law of large num- 
bers (1713) and is the subject of countless monographs and textbooks. The 
original statements of most of its results were for independent random vari- 
ables, but they were later extended to the martingale framework; the latter was 
reduced to its game-theoretic core in ^21 ■ The proof of the strong law of large 
numbers used in this paper was extracted from Ville's ^2] martingale proof of 
the law of the iterated logarithm (upper half). 

The theory of probability forecasting was a topic of intensive research in 
meteorology in the 1960s and 1970s; this research is summarized in 0. Machine 
learning is still mainly concerned with categorical prediction, but the situation 
appears to be changing. Probability forecasting using Bayesian networks is a 
mature field; the literature devoted to probability forecasting using decision 
trees and to calibrating other algorithms is also fairly rich. So far, however, 
the field of probability forecasting has been developing without any explicit 
connections with classical probability theory. 

Defensive forecasting is indirectly related, in a sense dual, to prediction with 
expert advice (reviewed in |15| . §4) and its special case, Bayesian prediction. In 
prediction with expert advice one starts with a given loss function and tries to 
make predictions that lead to a small loss as measured by that loss function. 
In defensive forecasting, one starts with a law of probability and then makes 
predictions such that this law of probability is satisfied. So the choice of the law 
of probability when designing the forecasting strategy plays a role analogous to 
the choice of the loss function in prediction with expert advice. 

In prediction with expert advice one combines a pool of potentially promis- 
ing forecasting strategies to obtain a forecasting strategy that performs not 
much worse than the best strategics in the pool. In defensive forecasting one 
combines strategies for Skeptic (such as the strategies corresponding to differ- 
ent test functions I and different ±e in jJU to obtain one strategy achieving an 
interesting goal (such as unbiasedness in the small) ; a strategy for Forecaster is 
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then obtained using Theorem^ The possibility of mixing strategies for Skeptic 
is as fundamental in defensive forecasting as the possibility of mixing strategies 
for Forecaster in prediction with expert advice. 

This paper continues the work started by Foster and Vohra 0| and later 
developed in, e.g., (the last paper replaces the von Mises-style frame- 

work of the previous papers with a martingale framework, as in this paper). 
The approach of this paper is similar to that of the recent paper [5] , which also 
considers deterministic forecasting strategies and continuous test functions for 
unbiasedness in the small. 

The main difference of this paper's approach from the bulk of work in learn- 
ing theory is that we do not make any assumptions about Reality's strategy. 

The following directions of further research appear to us most important: 

• extending Theorem ^ to other forecasting protocols (such as multi-label 
classification) and designing efficient algorithms for finding the corre- 
sponding p n ; 

• exploring forecasting strategies corresponding to: (a) Hoeffding's inequal- 
ity, (b) the central limit theorem, (c) the law of the iterated logarithm (all 
we did in this paper was to slightly extend the strong law of large numbers 
and then use it for probability forecasting). 
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